Web Search Results Reranking using Metrics Based on Link Analysis and Content Filtering

The IUP Journal of Systems Management :

Article Details

Pub. Date	:	May, 2007
Product Name	:	The IUP Journal of Systems Management
Product Type	:	Article
Product Code	:	IJSM10705
Author Name	:	Angelina Geetha, A Kannan and R Srinivasan
Availability	:	YES
Subject/Domain	:	Science and Technology
Download Format	:	PDF Format
No. of Pages	:	7

Price

For delivery in electronic format: Rs. 50; For delivery through courier (within India): Rs. 50 + Rs. 25 for Shipping & Handling Charges

Download

To download this Article click on the button below:

Description

Search engines are tools which help in effective retrieval of documents that match user queries. The documents retrieved are usually large in number. Various algorithms have been developed to rerank the search results. This paper proposes an algorithm that combines link analysis and content filtering. When a user requests for a search process, he enters a query and the search engine ranks the results. In this algorithm, every search result document is analyzed for its link structure, its Inlink referral, and its page size. The contents of the search document are analyzed for the degree of query match and also the distribution pattern of the keywords. Based on this the frequency score and position score are established. Using these link and content parameters, the search results are reranked. From the experimental results, it was found that the reranking based on this proposed multiparameter reranking algorithm improves the search results to a great extent.

Search engines have become an essential component of everyday life. Due to the rapid growth of the World Wide Web, the task of the search engines has become very challenging. Search engines which receives approximately 550 million requests per day (Grossman, 2003), play a vital role in finding and filtering the vast amount of data available on the web. Search engines typically respond to keywords in the user query by retrieving relevant URLs from their own data bases. Search engines, like Google and Altavista focus, more on general search features whereas search engines like WebSeek are more specific. Search features as well as number of search results vary, and no one search engine has achieved the task of giving the best index for all subjects. Most Internet searches are launched by casual users making simple queries on popular topics. Since such queries can generate tens of thousands of hits, the ranking of a document's relevance to a query is a core technology for search engines. Traditional information retrieval theory offers models for measuring the similarity between user-defined keywords and document contents. These models include the Vector space model, Cosine models, Probability models and Fuzzy logic models.

The search results are ranked by the search engine based on some pre-determined algorithms. Through experiments, it was found that such ranking of search results are not very relevant. This is because the documents retrieved by the web crawlers are very large in number and often, the documents that are retrieved are not related to user query. This has led to the need for reranking algorithms.A reranking algorithm accepts the search results from a search engine and reranks the results, based on various measures. Filters like link filter and content filters are widely used for this purpose. This paper combines the content filter and context filter for implementing the reranking algorithm. Relevancy ranking is the method that is used to order the result list in such a way that the web documents that are more relevant to the user query will be placed at the beginning.

Keywords

Web Search Results Reranking using Metrics Based on Link Analysis and Content Filtering,engines, algorithm, models, reranking, engine, content, queries, filter, relevant, retrieved, document, contents, filtering, retrieval, combines, algorithms, available, established, experiments, beginning, Google, Grossman, implementing, information, Inlink, launched, measuring, multiparameter, parameters, challenging, predetermined